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Abstract 

We derive a Kronecker product approximation for the micromagnetic long range interactions in a 
collocation framework by means of separable sine quadrature. Evaluation of this operator for struc- 
tured tensors (Canonical format, Tucker format. Tensor Trains) scales below linear in the volume 
size. Based on efficient usage of FFT for structured tensors, we are able to accelerate computations 
to quasi linear complexity in the number of collocation points used in one dimension. Quadratic 
convergence of the underlying collocation scheme as well as exponential convergence in the sepa- 
ration rank of the approximations is proved. Numerical experiments on accuracy and complexity 
confirm the theoretical results. 

1 Introduction 

Micromagnetism is a continuum theory for the treatment of magnetization processes in ferromagnetic 
bodies [9]. In the investigation of ferrormagnetic materials micromagntetic simulations are nowadays 
of high interest [12] and play an essential role in the development of magnetic data storage devices 
[28]. Starting from the total Gibb's free energy functional, containing exchange, magnetocrystalline 
anisotropy and magnetostatic contributions, micromagnetics either solves for local minimum configu- 
rations or solves an equation of motion for the magnetization [9]. Among these components the mag- 
netostatic field is the only non-local term and gives rise to magnetization structures on a length scale 
which is orders of magnitude greater than atomic spacing. Naive implementation of the superposition- 
based integral operators (3) or solvers for the underlying differential equation (Poisson equation (1)) 
yield computational costs proportional to the square of the number of grid points, i.e. 0{N^). Since 
the spacial resolution in such numerical computations has to be high enough for the correct description 
of magnetic domains, a quadratic scaling is almost never feasible. Historically, different methods have 
been proposed to reduce the computation effort. luan and Bertram [32] used the fast Fourier transform 

'Corresponding author, lukas@exl . at, lukas . exl@£hstp . ac. at 
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for evaluating the convolution of the magnetization with demagnetizing tensor. Blue and Scheinfein [8] 
applied a multipole expansion to the integration kernel. Both methods reduce the computational effort 
for magnetostatic field evaluation to 0{N log N). Fredkin and Koehler [14] coupled the finite element 
method with a boundary integral method to treat the open boundary problem in micromagnetics, cf. (1). 
If hierarchical matrices are used to compress the boundary element matrix [13], the computation of the 
boundary element part will be 0{m log m), where m is the number of nodes at the surface. 
Based on developments of approaches addressing high dimensional problems [7],[18] with a solution 
that can be approximated by separable functions, recently also tensor approximation methods [15], [1 1] 
were introduced into micromagnetics. 

In this paper we give a detailed mathematical analysis of the tensor grid method introduced in [11] and 
extend it to a FFT-based method. 

In micromagnetics, the magnetization m (a vector field in a closed and bounded domain Q c R^, which 
is zero outside) generates the so-called stray field or demagnetizing field = -V(p. The scalar potential 
(p is the solution to [21] 

= V • »i, (1) 

whereby \(f>\ = 0{l/r) and \V(p\ = 0{l/r^) as the distance r — > cxj, [9]. These regularity conditions are 
often referred to as open boundary conditions [12]. 

From the fundamental solution to the Laplace operator in it is possible to express as integral 
representation, i.e. 

47rJn \\x-y\\ 4n Jg^i \\x - y\\ 

Integration by parts yields 

m = -^ f m(y)--^^^dy. (3) 

The latter expression makes sense in micromagnetics due to the constraint \m\ = 1 a.e. in Q. [9], which 
impUes m e L~(0). Since the kernels K^'i\x) x^^^ \\xf e L^(Br(0)) for balls Sr(0) centered in 
with R> and p € [1,3/2), Holder's inequality ensures that (3) is well-defined. 

The paper is organized as follows. In Sec. 2 we give an introduction into the two widely used tensor 
formats, i.e. canonical tensors and Tucker tensors, where we also focus on aspects like (best) approx- 
imation from a theoretical and practical point of view. In Sec. 3 we prove quadratic convergence of a 
collocation scheme for the micromagnetic potential operator and derive a Kronecker product approx- 
imation, which is proved to be exponentially convergent. A brief description on sinc-function based 
approximation theory is also given, as well as aspects of recompression of the separable approxima- 
tion. Sec. 4 deals with the FFT acceleration of the potential evaluation, where we derive quasi linear 
complexity. Numerics in the end of the section confirm these results. 

2 Tensor formats 

For an extensive review on structured tensors and some algorithms to compute structured tensor ap- 
proximations see [22] and references therein. The most common arithmetic operations on Tucker and 
canonical tensors are presented in [5]; in addition we give a description of the Hadamard product and 
TT tensors [27], Sec. B. 

The following explanations are carried out for order-3 tensors, since the generalization to higher order 
tensors is straight forward. 
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We denote the set of (order-3) tensors ^ with mode sizes n = «2> ni,) over the field A" = R or C with 



and the set of matrices of size «i x ?i2 as usual with K"'^"^. 



Let ^, J/ £ R"". The Frobenius norm is defined as 



1 (1=1 (2 = 1 '3 = 1 

which is associated with a scalar product 



«l n2 rti 



.2 



(4) 



rii ni «3 

J/> 2 2 Xi^i^i, yi,i^i„ 

il = l !2 = 1 !3 = 1 



(5) 



With||^||2 =(^,^>. 



2.1 Canonical tensors 

A tensor ?( e <S)l=i is said to be in canonical format (CANDECOMP/PARAFAC (CP) decomposi- 
tion) with (outer product) rank R, if 

R 

X^Y^Ar U^r^ o Mp^ o (6) 
r=l 

with Ar € R, (unit) vectors mJ.^^ € R"^ and o is the tensor outer product. Abbreviating notation as in [22], 
a tensor in CP format is written as 

?( = U; U'^\ U^^\ U^^\ (7) 

with weight vector /I = [/^i, . . . , ^r] e R^ and (factor) matrices U^j^ ^ [ uf \ ...\u-^] e BI"'""^. The 
storage requirement for the canonical tensor format amounts to R Y^j^i fij. 

In the following we write Cn,r for the set of canonical tensors with mode sizes n = («i, «2, ni,) and rank 
r, and simple Cn,r, when the mode sizes are equal. 

The tensor (outer-) product rank, i.e. the minimal number of rank-1 terms in a representation like (6) 
for a tensor X, is an analogue of the matrix rank, however there are major differences between those 
two [22]. The product rank of a tensor might be different over R and C; in principle there is no easy 
algorithm to determine the tensor rank since this is an NP-complete problem [19]. In fact, there are 
several specific examples of tensors where only bounds exist for their ranks. 

Moreover the tensor rank is not upper semicontinuous, e.g. there exist sequences of tensors of rank < r 
converging to a tensor of rank greater than r [10]. There is no Eckart-Young Theorem available, i.e. a 
CP decomposition can not be computed via the SVD, indeed, it is possible that the best rank-r approxi- 
mation of a tensor of order greater than two may not even exist for the case r > 1, [10]. 
Nevertheless a broad community uses canonical decomposition, e.g. psychometrics, data mining, neu- 
roscience, image compression and classification, see [22] references theirin. 

Algorithms for computing canonical decompositions are mostly based on optimization, e.g. alternating 
least squares [22], gradient based or nonlinear least squares methods [3] or Gauss-Newton [30]. 
Also approximation of operators like the multiparticle Schrddinger operator [7] or Newton potential 
[18] can be done by using the canonical tensor format in order to overcome the curse of dimensionality. 

'Another common notation is K' where I = Ii x h x 1^ and /,, = {!...«,,), see [22] 
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For matrices A e r( «.)><( U^i "d^ typically arising from discretized operators, the canonical format is 
usually given in Kronecker product form [31] 

r 

A = Y^aj Uf ® Uf ® Uf\ (8) 

with matrices ?7^.*^ € R"?^"?, scalars a, and ® equals Kronecker product. Due to the relation vec(vec(?7(^^)o 
vec([/<2)) o vec([/(i))) = vec([/(^)) ® vec([/(2)) ® vec([/(3)), the form (8) can be identified with (6), where 
the vectorization vec(.) is understood as in [22]. 

Storage and tensor operations for the canonical format scale linearly in the dimension d, rather than 
exponentially as for dense tensors. However, the above mentioned drawbacks (instability, lack of robust 
algorithms) have led to the development of other (stable) formats that scale linearly in the dimension, 
such as 1-1 -Tucker [16] which relies on hierarchical tree structure, or the Tensor Train format [27], which 
briefly discussed in Sec. C. 

2.2 Tucker tensors and quasi-best approximation 

For a matrix U e the j-mode matrix product X Xj U of a tensor X e (S)p=i with U is defined 

element- wise in the following way. E.g. for j = I, 

"1 

{X X 1 U)i^ i, ,3 : ^ ^ Xi' ,2 ,3 M/i , (9) 

i' = \ 

i.e., the resulting tensor X X[ U e (S)p=i where mp = np, p i= 1 and m\ - r, is obtained by 
right- multiplication of the 1-mode fibers (columns) of X by U. Analogously for j = 2, 3; the cost for 
the computation of X Xj U is 0{r Y[]=\ ^j) operations in general. A Tensor X e (^^^j R"'' is said to be 
represented in Tucker format if 

X = Cxy U^^^X2U^^^x^U^^^ = IC;U^^\U^'^\U^^H, (10) 

with the so-called core tensor C e (^^^=1 ^"^^ (factor) matrices ?7^^^ € W^'^^i . The storage require- 
ment for (10) is Y\j=\ fj + 2y=i which is smaller than Y\j=i f^j if 0' 

The j-rank of a tensor X is the rank of the unfolding (j-mode matricization) X(j), [22]. By setting 
rj = ra.nk{X(j)), the tensor X is usually referred to as rank-{r\,r2, r^,) tensor. Of course rj < nj holds. 
In the following we denote the set of Tucker tensors with mode sizes n = (?ii,«2."3) and (7-) ranks 
r = {r\,r2, ri,) with ?"„ r- In fact, Tn r contains all tensors with mode-size n and 7-ranks smaller or equal 
rj, [17]. 

The set Tn,r is closed ^ due to the fact that the set of matrices of rank < r is closed, i.e. for a sequence of 
matrices {Un)nm with ranks < r we have lim„^oo Un -'■ U has rank < r. Thus, if we assume a sequence 
{Xn)nen c T„,r c <S)l=i R""> wc get lim„_^oo rank(^„(y)) < rj. 

In finite dimension, the closedness of the subset Tn r of a normed vector space, e.g. ((^^^j R"'', II ■ IIf), 
ensures the existence of a best-approximation Xh^st £ T'n,/- of an element inX e (^^^j R"'', i-c. 

Il^-^bestllf <ll^-J/||f VJ/€T-„,„ (11) 

see Sec. A Lemma 7. 

An approximation of a given tensor to prescribed j-ranks r was investigated in [25], where the described 

^This also holds for order-d tensors where d > 3 and can be proved along the same lines. 
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algorithm to compute such an approximation (Higher Order Singular Value Decomposition [HOSVD]) 
works by truncating the SVD of the j'-mode unfoldings. The resulting tensor is an approximation to the 
best-approximation in T„ r- Indeed, due to Property 10 in [25] we have for the HOSVD approximation 
''^HOSVD of a tensor X with j-ranks Rj and the descending ordered singular values of the j-th unfolding 
of X denoted with (t'^\ k = I . . .Rj (here formulated for order- 3 tensors) 



A Z Z ^i'^ - ^ll'^-'^bestllf , (12) 

\ j=l k=rj + l 



where Xbs^t is the best approximation of X in Tn^r, cf. [16]. 

Existence of the Tucker approximation was also investigated in [23], as well as alternating least squares 
methods (ALS) for the fitting problem 

min \\X -IC; U^^\U^^Ku^^^i\\l s.t. C/^^^ columnwise orthonormal, (13) 

where X e <S>l=i 1^"" is given and C e <S>l=i ^''^ U^''^ ^ WrX>-p to be computed. 
Another algorithm is the so-called higher order orthogonal iteration (HOOI), an ALS algorithm, [22], 
which can also be used for the purpose of recompression (namely computing a quasi-optimal Tucker 
approximation of lower rank to a given Tucker tensor). Algorithmic variants of the HOOI were investi- 
gated in [4]. 

Using the orthonormality of the factor matrices and rewriting the objective in (13) gives 

\X - IC; U^^\ U^^\ U'^^^jWI = \\X\\^p + \\C\\l -2(xxi U^^'>'^ U^^^'^ X3 U^^'>'^ , . (14) 

Its gradient w.r.t. to C is given as 

dc - IC; U^^\ U'-^K U^^hfp = 2(C - a: xi U^^^^ X2 U^^^^ X3 U^^^^), (15) 
which attains zero for 

C = X xi U^^^^ X2 U*^^^^ X3 U^^^'^ , (16) 
giving a necessary condition for the optimal choice of the core. 

By inserting (16) into (14) problem (13) can be recasted as a maximization problem [22], [24], i.e. 

2 



max 



X Xi U^^'>^ X2 U^^''^ X3 U^^'>^ s.t. f/^P^ columnwise orthonormal. (17) 



An alternating least squares approach for solving (17) can easily be derived by alternately fixing all 
but one factor matrix and solving for the remaining by an SVD-approach, see [22] and Alg.l. In the 
description of Alg. 1 we assume the (common) convention of descending ordered singular values. It is 
enough to compute the so-called economic sized svd, namely, in the case Yia^p fi < np only the first 
n,^p ri columns of U have to be computed and X € K.n,*p nxUi*,, n . analog for the case O/^tp ^ "p- 
The SVD is truncated in such way that the relative error in the Frobenius norm is smaller than the given 
tolerance of e/ V3. HOOI converges to a solution where the objective function of (14) ceases to decrease; 
in fact, the convergence of HOOI to a global optimum, not even to stationary points, is not guaranteed 
[24], [23]. Nevertheless, to our experience Alg.l works well in practice and mostly yields better results 
than HOSVD (even for random initialization). An efficient generalization of Alg.l to recompression, 
i.e. the case where X is already in Tucker form, is straight forward. 

HOOI can also be used for approximate addition of Tucker tensors. Assume X is a sum of two Tucker 
tensors with equal mode sizes (Block CP [BCP], [22], [20]), i.e. 

X = ID; V^^\ V^^\ V^^) I + |[£; W^^\ W^^\ W^-^^ |. (18) 

Mode multiplications and matricization have to be performed with respect to the BCP format, which can 
be done elementwise (summand by summand). 
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Algorithm 1 HOOI (ALS); HOOI(^) 



Require: X e (^^^j R"'', 6 > 0, rp^ e N>i, p = 1 ... 3 (initial guesses for j-ranks) 
Ensure: C e (g)J^j R'", U^p^ € R'^i'^^'p, p = l...3 

1: Initialize U'-P^ e R"i''"'i'o, p = l...3 e.g. by HOSVD or random 

2: repeat 

3: for 7? = 1 ... 3 do 

4: }/ xj^j i7(^')^ 

5: {U, Z) ^ svd(J/(p)) (where R'vxn,*,, n 3 j/^^^ ^ ij-^y-^ 

6: rp ^ minir | ||X(1 : r, 1 : r)||f - ^/z^^ < IPIIf ) 

7: [/(P) ^ [/(:, 1 : r^) 

8: end for 

9: until fit ceases to improve 

10: C^XXi [/(1)^X2 t/<2)^X3 t/^^^^ 



3 Kronecker product approximation for long-range interactions 

We will derive a Kronecker product approximation (of. (8)) for the potential operator (3) by certain 
quadrature of an integral representation of the convolution kernel. 

In order to motivate the strategy, let us assume a multivariate function / - f{p) : — > R where 
p = Wxlf € [a,b] c R and the integral representation 

f(p)= f g{T)ef''^'^dT. (19) 
Jr 

If we can apply quadrature to (19) with nodes and weights (tk,cok), k = 1 .../?, we obtain a separable 
representation of /, i.e. 

R R d 

m « 2 Sitk) eP'^^'^^ = 2] a;, g{tu) ]~[ e^"'"''^'^\ (20) 

k=\ k=\ p=\ 

The quadrature order R refers to the separation rank of (20). 

In the following we derive a separable representation for the convolution operator in (3) after applying 
a collocation scheme. This representation leads to the desired Kronecker product form of the operator, 
see Corr. 4. 

3.1 Notation 

Let a = Xp=i ^^''^ c R3 with Q^p* = {ap,l3p\ c R and assume for /? = 1 ... 3 a partition of D!^p^ into 
np sub-intervals \ i = \ . . .np. On the resulting tensor grid T := W''^^ x W^^^ x W^^^ of Q where 
W^P^ = X"=i we define sets of collocation points {^[''^ € I^.''-' : i - \ . . . np], p = 1 ... 3 (for ease of 
presentation, one collocation point per sub-interval, e.g. midpoints). 

We further denote the number of collocation points = (^j^'\ ^|^^), i = (ii , ij, is) with N := Yip "p- 
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3.2 The collocation scheme 

Using the tensor product basis functions (e.g. il/^f^ '■= Xf p) indicator function of sub-interval I^^^) 



p=i 



and the ansatz cf. [1 1] (mf = m^i\^j), j = (juji, js)) 



(21) 



(22) 



our collocation scheme for (3) takes the form (i = {11,12, h), j = (ji, 72, J3)) 

3 



where 



/^\a:,j) := 



^iq) _ y(q) 



Lemma 1. Let m e C^{Q.). Then the collocation scheme (23), where are the midpoints of IV" i 
I . . .np, converges quadratic ally. 

Proof. Let us assume (w.l.o.g) uniform spacings in each dimension, i.e. hp := Ijnp, and use the notation 
h := maxp=i 3 hp. Furthermore, let Q.j := Xp=i ^'f''- 

We estimate the local error e{^i) - - ^(^,-)| (cf. (23) and (3)) for each fixed i and q in (23) separately, 
i.e. we define 



(23) 



(24) 



(P) : _ 



£ (mf - m(^)(j)) g'-''\^i,y) ^Pjiy) dy 

By using Taylor expansion for m^i^ at 'source' points ^j, i.e. 

m(^)(j) - m^'i%) + {Vm^^^K^jly - ^j) + Oi\\y - ^j\f), 
we obtain (Ci > independent ofh) 

e,i^i) < V r (ym^'X^j), y - ^j) g^'\^i, y) if^jiy) dy +C, f \\y - ^jf g'''\ii,y) i^jiy) dy 
j Jn Jn 

It can easily be seen that g'-'^\^i, .) € LP(Q.j)for all j and I < p < 3/2. 
Hence, the second term in (28) allows the estimate 

z r 11^ - s^'^^^-y^ ^j^^ < 2] oiLi.n.) r iij - ^jiy) dy 

JQ. ; ^ Jo, 



(25) 



(26) 



(27) 



(28) 



\LHfl) 



dy<U''\^i,. 



\LH.m 



\Q\h^ = 0{h^). 



(29) 
(30) 
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For the first term we distinguish between the diagonal (i = j) and non-diagonal (i j) case. The 
kernel g^'^\^i, .) is analytic in the case i + j and thus allows Taylor expansion, i.e. 



g'^\^i,y) = g"'\^i,^j) + 0{\\y-^j\ 



(31) 



The constant term in the expansion does not contribute to the first term in (28) since y — is odd w.rt. 
in Q.j. We get (C2 > independent ofh) 

2] r {Vm^''%),y-ij)g^'\€i,y)il^j(y)dy <\\Vm^'%)\\Y, f 0{\\y-ij\f)<C2\n\h^ ^0{h^). 



(32) 



In the case i = j we have for the first term in (28) 



r {Vm'^'>\ii),y-€i)g^''\€i,y)^i(y)dy < ||Vm(?)(^,.)|| r \\y-€i\\\g^''\€i,y)\il^i(y)dy<C2{hfm 



), 
(33) 



where {hmd := L dy with f(y) - - y('^))«A,-(3') e LP{Q.), p > 1 with compact support O,-. 

Since 1/ ||jir||^ e LP{Q.), \ < p < 3/2, we proceed with 



ihmi. 



dx - 4k h , 



(34) 



which completes the proof. 



Fig. 1 shows the quadratic convergence of the error of the collocation scheme (23) compared to (3) 
evaluated at the origin^ for the radial symmetric function m{x) = exp(- \\x\\^). 
The evaluation of the potential on T by abbreviating notation for the matrices G^^^ € R'^^^ with 
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Figure 1: Absolute error of collocation scheme (23). 



entries 



G^= f 8'-'\^i,y)^j(y)dy 



(35) 



^We used Maple 14 and evaluated the expression (3) at the points f = 1 /2(h, h, h) with h= i/n. 
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and the grid-sampled magnetization components trf^^ € is given as 

1 ^ 

(l>--—yG^''^m^"\ (36) 
An ^ 

q=\ 

yielding computational effort 0{N^) if no special structure on G^^^ can be imposed. 
We will use the term (potential) operator for 

1 ^ 

P : X X ^ R^, {m^^\ m^^\ m^^^) ^ -— V G^'^^ m^"\ (37) 

An ^ 

q=\ 

In the following we use Gauss-Transform (cf. [11]) and sinc-quadrature [29] to construct a Kronecker 
product structure for (37) with an asymptotically optimal approximation error yielding a 'tensor' version 
f of (37) that operates on structured tensors. 



3.3 Sine quadrature 

We state some basic facts from the theory of sine function based approximation, see [29], [18]. 
The sine function sinc(;c) := 515(5^) ^n analytic function which is 1 at a; = and zero at jc € Z \ {0). 
Sufficiently fast decaying continuous functions / € C(R) can be interpolated at the grid points x = k& e 
& > (step size) by functions Sk,ffix) := smc{x/& - k), i.e 

/^(x)-2]/(^'^)^^.*W- (38) 

fceZ 

Since ^ sinc(0 At = \, the interpolatory quadrature for j^ f{t) dt leads to 



J 

Jr 



mdt^&Yjf^''^^^ (39) 

k€Z 



which can be seen as infinite trapezional rule. Truncation to ^ = -R . . .R of the infinite sum in (39) 
leads to the sine quadrature rule with 27? + 1 terms with the truncation error X|/t|>R fi^'^') that obviously 
depends on the decay-rate of / on the real axis. 

For functions / e H^{Dg), 6 < njl (Hardy space), i.e. which are holomorphic in the strip := {z € 
C : |3z| < 5) with 

N(f,Ds):= [ \f(z)\\dz\= [ (\f(t + i6)\ + \f(t-i6)\)dt<c^, (40) 
JdDs Jr 

and in addition to / € H^(Ds) have double exponential decay on the real axis, the following exponential 
error estimate for the sine quadrature holds (cf. [18], Proposition 2.1), which we state for sake of 
completeness. 

Theorem 2 ([18]). Let f e H^(Ds) with some 6 < n/2. Iff satisfies the condition 

1/(01 < C exp(-be"^'^) e R with a,b,C > 0, (41) 
then the quadrature error for the special choice - log(^^)/(a/?) satisifies 



f f(t)dt-i^Y,f(M) 

\k\<R 



CN(f,Ds)e.p( -^ffj . (42) 
^log(2naR/b)' 



Remark. In the case f(p) as in (19) the constants in (42) depend on p. For some fixed p, an accuracy 
of e > can be achieved with R = 0(\ log e| • log | log e|). 
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3.4 Separable approximation of G^^^ 
We make use of the Gaussian transform 



i = 4 r -'-"^'^'dr, (43) 
to obtain for p = ||^,- - j|p and q - 1 ... 3 the new representation for (35) 

3 

Gf = A r ^2 rr ^ip) dr . A r f^r) dr, (44) 



with 

I e 

'jp 



(45) 



Lemma 3. After applying the substitution t - sinh(?) the transformed integral (44) with integrand 
f(t) := cosh(0 /(sinh(f)) allows an exponentially convergent sine quadrature, cf. Theorem 2, where the 
constants in (42) depend on the parameters in (45). 



Proof. We assume w.l.o.g. q = ^ and (45) to be transformed to integrals over intervals {ap, bp], i.e. 
ap :- ^i^^ - jphp and bp :- - {jp - \)hp, where hp is the length of the interval I'^p, which w.l.o.g. 

is assumed to be constant on the p-th axis. We set C := ny=i['^7>^7] Y\^j=\\-hj 12,13] — aj] (assume 
midpoints as collocation points); then the transformed integrand in (44) reads 

fit) = 1 x('^ cosh(0 sinh2(0 exp ( - sinh2(?) ^ x^^)^) d{x'^^\ x^^\ x^^^). (46) 
We obtain analytically up to constants 

fit) oc ^^^{e-''' '^'"h^W - e-''' ''''^'''''>)(erfib2 sinh(O) - erf{a2 sinh(0))ferX^3 sinh(O) - etf^a^ sinh(O)), 
sinh"^(0 ' 

(47) 



where the so-called error function is defined as 



erfix):=^ f e-^' dr. (48) 



The functions sinh(z))/ sinh(z), exp( - sinh^(z)) and cosh(z) are all entire functions, hence f is 
holomorphic over C. 
Moreover there holds (a > 0) 

exp ( - a^ sinh2(0) = <9( exp(-^e2l^l)) as \t\ oo (49) 

and using asymptotic expansion for the error function ^ gives (a,b > 0, a b) 

. ^ . ,/exp i - a^ sinh^(?)) - exp { - b^ sinh^(?)) \ 

erfib sinh(O) - erfia sinh(O) - 0{-^ — —] as \t\ oo, (50) 

^ sinh(0 ' 

4erf(x) « 1 - SiE<z£)(i _ 1 + 3 _ ^ ^ ^ ^ ) _^ ^ 
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which shows the required double exponential decay for f. 

It remains to show that N{f, Dg) < oo. Wegetforz. = t±i6, H := Yj]=ih^j/4andCo = Y[]^i[hj/2,/3j-aj] 



J J /^'> cosh(f ± iS) sinh^Cf + iS) exp ( - smh^(t ± id) ^ x^^'>^) d{x^^\ x^^\ x^^^) 



7=1 

ICol ^ 



dt < (51) 



^ |cosh(f + i5)\ |sinh^(f + i5)\ |exp ( - sinh^(f ± iS)H)\ dt < oo, (52) 

since the remaining integrand is a smooth function in t with (double) exponential decay as \t\ oo. 
This completes the proof. □ 

Remark. Since in an estimate for N{f, Dg) a term like \ exp (-a^ sinh^(f±/(5))| - exp ^-y ( cos(25) cosh(2f)- 

1)^ can have positive exponent (e.g. t close to zero), the norm cannot be estimated uniformly in hp, pro- 
hibiting an exponentially convergent sine quadrature for hp 0. Nevertheless, for the grid assumed 
to be fixed. Lemma 3 gives us a Kronecker product approximation of the operator (37) with separation 
rank R, see Cor 4, where R = 0{\ log e| • log | log e\)for a prescribed accuracy e. 

Corollary 4. The operator (37) admits a Kronecker product approximation of the form (8) with rank R, 
where R - 0{\ log e\ ■ log | log e\)for a prescribed accuracy e. 

Proof. The substitution r = sinh(f) preserves the symmetry in the integrand (43), leading to a R + 1- 
term sine quadrature after applying Lemma 3. Omitting the first term, which is zero, leads to the R-term 
representation for the potential operator P : X^=i (S)p=i ^ ^^^=1 ('^f- *^^^) '^"'^ i^^)) ^^^^ ^ 



R 

with 



iU^^\j^ = hlP^j^{smh(tt)) and at = cosh(tk) sinh^t;,), (54) 
where t^ = k&for •& = cq log{R)/R and appropriate cq > cf. Theorem 2. □ 



The evaluation of one component of P (cf. (53) for a rank-1 tensor, i.e. X = v^^^ 
lv^^\ v'-^\ v^^^l e C„,i, is given as 



and amounts in a computational cost of 0{R Z^^j «p). 

It is not far to seek a reduction of this complexity by reducing the cost for the matrix- vector product (cf. 
Sec. 4) or reduction of the separation rank R (cf. Sec. 3.5). 

By using the relations^ [5] (we assume appropriate dimensions for the involved matrices) 

(Ai(8)A2)(BiOB2) (56) 
(Ai®A2)(Ci®C2) -AiCi®A2C2, (57) 

'One notes that with/ = ii+(i2-\-)ni+{h-l)n\ andy = 71 +O2- +O3- 1)^2 the entries of a matrix A 6 ^l-^^i'^K-i"" 
given by Ujj = af \ af\^af\ correspond to the (/, /)-entry of A<^' ® A<^> igi A*''. 
*The symbol stands here for the Khatri-Rao product, cf. Sec. B 
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and 

wec[U; V^^\ V^^\ V^^^) = (V^^^ O V^^) © vd))^, (58) 

respectively 

vec(|[C; V^^\ V^^\ V^^^j) = (V^^^ ® V^^^ » V^^^)vec{C), (59) 
we get for the evaluation of (53) for X e Cn,r (also compare with [1 1]) 

1 ^ 

p('J)X - — ^ 2 a, li; U'-^'W^'K U^^'W^^K U'^^'\^^\ (60) 

respectively for X e T„ r the formula 

1 ^ 

= -- L y a, EC; uf^v^'K u^fv^^K t/^^' V^^^i. (6i) 

3.5 Some practical issues 

Here we briefly address the question how to choose the rank R. 

If we apply sine quadrature to the Gaussian transformed kernel (43), we can adaptively determine the 
rank by contoUing the relative error of the quadrature in an interval p € [pmin.Pmax] corresponding to the 
mesh size parameter h by the relation p = ||^,- - j|p > 3h^/4 (cf. proof of Lemma 3 Eq. (46)). Since 
we can scale the computational domain to unity, we considered in Fig. 2 p^ax ^ 3, corresponding to 
^scaled = [0, 1]^. We obscrvc an uniform relative error bound for h greater some /zmin- Also compare 
with [11]. Moreover, we see from Fig. 2 the exponential decay of the error w.r.t. the rank (the logarithmic 
error is a decreasing afiine function of the rank). 

The separation rank can also be reduced by applying recompression of the CP representation (cf. 
Sec. 2.1) corresponding to (53), either by compressing (53) as Tucker tensor by Alg. 1 and subsequent 
approximation of the resulting core to CP by optimization based algorithms [3] yielding a canonical 
tensor with smaller rank, or direct CP approximation to a smaller rank. The resulting CP representation 
again corresponds to a Kronecker product representation. 

4 FFT acceleration 

Here we address the question how to use FFT in order to reduce compuational costs for evaluating the 
operator P. 

4.1 Multidimensional DFT for structured tensors 

Again we present the following for the case of order-3 tensors; everthing is also valid for the general 
case of order-(i. 

For a given tensor X e (^^^j R-"'' the 3-d discrete Fourier Transform (DFT) results in the complex 
tensor X e (^^^^ C"'' and is defined as 

n[ «2 "3 

;i = l 72=1 ;3 = 1 
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Figure 2: Relative error for sine quadrature of (43) with substitutuion t = sinh(f). cq = 2.15 

kpip 



where oj,, - e 



lip 

The inverse discrete Fourier Transformation (IDFT) is given by 

n\ «2 "3 



1 "i "J 

"i"2«3 .f^i 

FFT variants with zero-padding are commonly used because of their almost linear scaling, i.e. OiY^^^^ log(?ip) 0^=1 ^q)- 
The convolution theorem holds, i.e. in multi-index notation and using the operator symbols and 
for the DFT and IDFT respectively 

^ * 7C„ - r-\T{X) ■ Ti%% (64) 

where the subscript denotes the n- shift and • the elementwise product {Hadamard product). 
The following is straight forward to show. 

Lemma 5. For a canonical tensor X = U^^\ U'-^\ U'-^^ | e Cn,r, xj^pjj, = 'Ej'i=i '^j "^1/ "^3/ the 
DFT is given by 

X^IA- rid{U^% TidiU'-^^), TidiU^'^) I, (65) 

where the (1-d) Fourier transform fj^ is only taken along each column of a factor matrix. 
The IDFT is given as 

X^IA; Tf.'iU^^rfJiU^^^irfJiU^'^n. (66) 

Proof. 

r n[ m H3 

1=1 ;i=i 72=1 ;3=i 

and analog for the IDFT. □ 
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Lemma 6. For a Tucker tensor ?( - |[C; U^'^\ U^^\ U^^^ eT„r, x^un ^ ILWl , c//,-' u^^\, 
the DFT is given by 

X = IC; Tid{U^\Tid{U^'^^),TM{U^^^n, (68) 

where the (1-d) Fourier transform fid is only taken along each column of a factor matrix. 
The IDFT is given as 

Proof. 

ri,r2,n "1 '!2 "3 

7'iJ2J3 = l 71 = 1 j2 = l 73 = 1 

a?i<i analog for the IDFT. □ 

The DFT as well as the IDFT for structured tensors is therefor basically one-dimensional, more 
precise, assuming FFT and IFFT implementations, the costs are 0{R np log np) for canonical tensors 
and 0{Yjp rpnp log np) for Tucker tensors respectively. This matches asymptotically the costs for one 
dimensional DFFT times the rank constants. 

4.2 Convolution kernel and FFT evaluation 

The evaluation of the operator (53) scales quadratically in the number of collocation points in one dimen- 
sion. Even though this is super-optimal, also refered to as sub-linear (cf. [11] or (slightly misleading) 
as super-linear (cf. [15]), we can still reduce this complexity (on uniform grids) by observing a convo- 
lution in (3). 

Let us assume that hp is constant for each p and the collocation points to be the midpoints of the intervals 
/(P) = /(P) 

7p 

By applying the substitution used in the proof of Lemma 3 we get for the functions in (45) 



h\P' .{T) = hf.{T) = \ 22 (71) 



where ^ {ip - jp)hp - ^ and bi^^j^ = {ip - jp)hp -1- ^. 



For ip, ip - \ . . .np these are 2np - 1 different integrals. We therefor set ip - jp = Jp = I . . . 2np - 1 and 
identify the convolution kernel G^f\ e (S)p=i with entries 



Lemma 3 also holds for (72) (after the substitution t = sinh(f)) and hence we can apply sine quadrature 
leading to (cf. (37) and (44)) the canonical representation 

P^"^ = —^l^; D^?, Df,Df I e C2„-i,«, (73) 

with 

{Df)j^^ = h^f^{&mh{tk)) and At = cosh(fi) sini?{tk), (74) 
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where tk - k-& for & - co log{R)/R and appropriate cq > cf. Theorem 2. 
;P^*^ is a separable convolution kernel. 

By denoting the elementwise product (Hadamard product) for tensors with • and using DFT, the evalua- 
tion of one component of !P (cf. (73)) for a tensor X e (S)p=i with (zero-padded) Fourier transform 

Xe (g)p=iC2""-i reads 

^('?) * A' = - -^I D^/\ , ] • ^ (75) 

The complexity of FFT-based convolution in (75) with an unstructured tensor X is dominated by the costs 
for the FFT of X, and amounts in a complexity of 0{R Zp=i log f^p 0^=1 f^q)' which is comparable with 
usual FFT-based computation of the scalar potential (except the constant R, the order of sine quadrature), 
cf. [2],[1]. 

For structured tensors X the complexity of FFT-based convolution with the separable kernels (73) scales 
like np log(np) in the mode sizes due to the Lemmas 5 and 6 of Sec.4.1. 

We can think of recompression of (73), e.g. efficient compression of a CP tensor to a Tucker tensor 
by Alg.l or to the Tensor Train format (TT) C, if the magnetization itself is represented in Tucker 
representation. The (complex) Hadamard product in Fourier space can be performed efficiently with the 
techniques described in Sec. B.l or B.2, depending on the structure of the magnetization component 
tensors. 

Choosing a large number of quadrature terms (oc rank R) becomes a feasible option, which results in an 
accurate representation of the operator (37). Since the computation of (37) as well as the recompression 
of the discrete kernels (73) are done in a setup-phase in a micromagnetic simulation, the higher amount 
of work due to higher ranks does not significantly influence the overall computing time. 



Algorithm 2 Scalar potential DFFT; potentialCM^^^ At^^^ Al^^\ h, R, cq, tol > 0) 

Require: M^p'> e T^^^d'h /jp > (p = 1 . . . 3), 7? € N, co > 0, tol > 
Ensure: $ € T„y 
Setup 

• Compute the CP kernels (73) 

• Compress the kernels to Tucker tensors with tolerance tol 

• Compute DFFT of Tucker kernels by using Lemma 6 

Actual computation 
for ^ = 1 ... 3 do 

• Compute DFFT of ^-th Tucker magnetization component by using Lemma 6 

• Compute Tucker Hadamard products of magnetization component and kernels in Fourier space 
using e.g. the tolerance tol 

• Compute IDFFT for result of previous step 
end for 

Perform approximate summation of results in previous step using e.g. the tolerance tol 



Alg. 2 describes the FFT-based procedure for computing the scalar potential for Tucker magnetiza- 
tion. Fig. 3 shows results on complexity and accuracy using Alg. 2 for the case of uniform magnetization, 
cf. [11]. We observe the quasi linear complexity {n log n), while the relative error in the energy decreases 
with order about 1.5.^ Fig. 4 shows logarithmic rank-grows for the stray field (averaged over modes) 

'The discretization for the energy as well as the stray field calculation from the potential were carried out second order. 
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Figure 3: Complexity and relative error in the energy for Alg. 2. 
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Figure 4: Ranks of the computed stray field using Alg. 2 with accuracy le-12 and a flower-like magne- 
tization state. 
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induced by a non-trivial (flower-like) magnteization state [2] while using an accuracy of le-12 in Alg. 2. 
Computations were performed in parallel in Matlab 7.11.0 using the classes provided by the Tensor 
Toolbox V.2.5 [6] on a Linux Workstation with a Quad-Core Intel 17 processor and 6 GB RAM. 

5 Conclusion 

We gave a detailed review of tensor formats and their use in approximation of analytical operators. A 
quadratically convergent collocation scheme on tensor product grids for the micromagnetic potential 
operator is proved to have a Kronecker product approximation with exponential convergence in the 
separation rank. This is a mathematically rigorous confirmation of the algorithm given in [11]. The 
discrete Fourier transform for structured tensors can be used for the purpose of accelerating the method 
on uniform grids, yielding quasi linear complexity in the number of collocation points in one dimension. 

A Best approximation 

Definition 1. Let {V,d) be a metric space and % i= U Q V. A best approximation ofve V in U is an 
element u* e U such that 

d{u,v) - d{U, v) ■- mf{d{u, v) : u e U), (76) 

i.e. the infimum is attained in U. 

If y is a normed vector space the condition (76) reads: || v - m* || < || v - m || for all u e U. 

Lemma 7. Let (V, || . ||) be a normed vector space with dim{V) < oo and % i= U Q V a closed subset. 
Then, for allveV there exists an element utest ^ U with \\v - utest II < II v - m ||/or all u € U. 

Proof. Let v e V andu € U and define D :- U Ci {u e V : ||v - m|| < ||v -"mID Q U. Since D is 
non-empty (u e D), bounded (triangle inequality) and closed (D is intersection of two closed sets) and 
hence compact^, the continuity of the function f : D ^ K, f(u) := || v - m|| together with the extrem 
value theorem^ ensures that f attains its minimal value at Uh^st & D Q U. □ 

B Hadamard product for structured tensors 
B.l CP tensors 

For two canonical tensors ?( ^ lA; U^^\ U^'^\ U^^'> J € C„,, and J/ = Ifi; V^'^\ V^^\ V^^'> | € C„,r' of 
equal mode sizes the Hadamard product (elementwise product) is 

i'^ ■ ^)uK - Z Z ^i^il^ ("^^g) - I w^^^^ w^^'^ w^^'^ 3 ^ (77) 

i 

where v = [A,n] and W^p^ = [wf ^ • vf ^ | . . . {u^' • v^f Y ■ ■ "r''^ • v|.f^]. The cost for forming (77) is of 
order 0{rr' H^^j np). The new CP tensor has rank rr'. Direct recompression can be considered e.g. by 
optimization [3]. 

*This conclusion fails in infinite dimensional normed vector spaces in general. 
'The image of compact sets under continuous functions is compact. 
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B.2 Tucker tensors 

For two Tucker tensors ^ - I C; U^'^l U^^\ U^^^ | € T„^r and J/ - I D; V''^\ V^^), y^^) | e T„^r' of equal 
mode sizes the Hadamard product (elementwise product) is 

(-^ • ^JK = Z Z '^J' ^u^n) (78) 

ij,k l,m,n 

The Hadamard product (78) can be written in compact form with the Khatri-Rao product. 
Definition 2. Given two matrices A e R^^^ and B € "SJ^^, their Khatri-Rao product is given by 

AQB = [_a-A®b;i a.,2®b;2 ... a-.K^b-.K]^"^"""^. (79) 
It is straight forward to show 

where £ is the reshaped tensor product of C and D, i.e. e(ii)(jm)(km) - Cijk dimn- The costs for computing 
(80) are therefor OiY^l^, ''p^'p^'p + ^l=i '"p^'P" 

The new Tucker tensor has ranks r ■ r'. It is therefore practical to recompress the original cores before 
building the tensor-product; also recompression of (80) is highly recommended if further operations 
with ■ J/ are planned. Indeed, in practice often very effective recompression of the core £ and/or 
Hadamard product itself is observed. 



C Relation between Hicker tensors and Tensor Trains (TT) in 3 dimen- 
sions 

A Tensor Train ^ (TT) [27] in d dimensions is given as 

aiii2...id = Gi(/i)G2(/2) ■ . ■ Gdiid), (81) 
where Gk{ik) is an r^-i x r^ matrix with ro = r^ = I. Writing out the products leads to 

<^iih...id= Gi(^i^'^i^G2{ai,i2,a;2) ■ ■ -Gd-iiad-iid-iad-OGdiad-iJd)- (82) 

ai,...,aj-i 

There holds a quasi-best approximation result due to best rank-r^^ approximation of the unfolding ma- 
trices of [27]. Recompression (rounding) and CP2TT is also possible [27], as well as black-box 
approximation by AC A type algorithms [26]. 
In three dimensions (82) reduces to 

'^'ihh ^ Z Gi{ii,ai)G2{ai,i2,a2)G3(.a2,i3)- (83) 

From (83) we conclude 

^-^XiGi X3G[ = |[^;Gi,id,G[|, (84) 

where ^ denotes the 3-tensor G2 e (S)p=i where ntp = rp, p 1 and m2 = ?i2- Recompression of 
(84) leads to a (r'j, r3)-Tucker tensor. 

The representation (84) is also known as Tucker! decomposition of a tensor [22]. 
On the other hand a Tucker tensor can also be easily converted to a TT, i.e. by mode-multiplication of 
one factor matrix (e.g. the second factor) we immediately get the form (84). Recompression by TT- 
rounding can be done afterwards. The Hadamard product (as well as many other arithmetic operations) 
can be carried out efficiently in TT-format, [27]. 
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